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Conditional sampling for spectrally discrete 
max-stable random fields* 

Yizao Wang and Stilian A. Stoev ' 
November 29, 2010 

Abstract 



Max-stable random fields play a central role in modeling extreme 
value phenomena. We obtain an explicit formula for the conditional 
probability in general max-linear models, which include a large class of 
max-stable random fields. As a consequence, we develop an algorithm 
for efficient and exact sampling from the conditional distributions. Our 
method provides a computational solution to the prediction problem 
for spectrally discrete max-stable random fields. This work offers new 
tools and a new perspective to many statistical inference problems for 
spatial extremes, arising, for example, in meteorology, geology, and 
^T) . environmental applications. 
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O ' 1 Introduction 



1.1 Motivation 

Max-stable stochastic processes and random fields are fundamental statis- 
tical models for the dependence of extremes. This is because they arise in 
the limit of rescaled maxima. Indeed, consider the component- wise maxima 

M t (n) = max $\ t£T 

j=l,...,n 

of independent realizations {£ t }teT, j = l,...,n of a random field £ = 
{£,t}teT- If the random field {M t ■ }teT converges in law, as n — > oo, under 
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judicious normalization, then its limit X = {Xt}t£T is necessarily max-stable 
(see e.g. Resnick |19j and de Haan and Ferreira |llj). 

Therefore, the max-stable processes (random fields, resp.) are as impor- 
tant to extreme value theory as are the Gaussian processes to the classical 
statistical theory based on the central limit theorem. The multivariate max- 
stable laws and processes have been studied extensively in the past 30 years. 
See e.g. Balkema and Resnick P], de Haan [H1UJJ], de Haan and Pickands [IB] . 
Gine et al. [15] , Smith [23], Resnick and Roy [21], Davis and Resnick [6] [7], 
Stoev and Taqqu [25] . Kabluchko et al. [IB] . Wang and Stoev [28], among 
many others. 

The modeling and parameter estimation of the univariate marginal dis- 
tributions of the extremes have been studied extensively (see e.g. Davison 
and Smith [8], de Haan and Ferreira [IT], Resnick [20] and the references 
therein). Many of the recent developments in this domain focus on the 
characterization, modeling and estimation of the dependence for multivari- 
ate extremes. In this context, building adequate max-stable processes and 
random fields plays a key role. See e.g. de Haan and Pereira [12], Buishand 
et al. [2J, Schlather [22], Schlather and Tawn [23], Cooley et al. [3J, and 
Naveau et al. [17] . 

Our present work is motivated by an important and long-standing chal- 
lenge, namely, the prediction for max-stable random processes and fields. 
Suppose that one already has a suitable max-stable model for the depen- 
dence structure of a random field {X t }t&T- The field is observed at sev- 
eral locations t\ , . . . , t n G T and one wants to predict the values of the 
field X Sl , . . . ,X Sm at some other locations. The optimal predictors involve 
the conditional distribution of {Xt}t£T, given the data. Even if the finite- 
dimensional distributions of the field {Xt}t^T are available in analytic form, 
it is typically impossible to obtain a closed-form solution for the conditional 
distribution. Naive Monte Carlo approximations are not practical either, 
since they involve conditioning on events of infinitesimal probability, which 
leads to mounting errors and computational costs. 

Prior studies of Davis and Resnick [6] [7J and Cooley et al. [3J, among 
others, have shown that the prediction problem in the max-stable context 
is challenging, and it does not have an elegant analytical solution. On the 
other hand, the growing popularity and the use of max-stable processes in 
various applications, make this an important problem. This motivated us 
to seek a computational solution. 

In this work, we develop theory and methodology for sampling from 
the conditional distributions of spectrally discrete max-stable models. More 
precisely, we provide an algorithm that can generate efficiently exact hide- 



pendent samples from the regular conditional probability of (X S1 , . . . ,X Sm ), 
given the values (X^ , . . . , Xf n ) . For the sake of simplicity, we write 
X = (Xi, . . . , X n ) = (X tl , . . . , X tn ). The algorithm applies to the general 
max-linear model: 

P 
Xi = max a i; jZj = \J aijZj , i = 1, . . . , n. (1) 

i=i 

where the ffljj's are known non-negative constants and the Zj's are inde- 
pendent continuous non-negative random variables. Any multivariate max- 
stable distribution can be approximated arbitrarily well via a max-linear 
model with sufficiently large p. 

The main idea is to first generate samples from the regular conditional 
probability distribution of Z | X = x, where Z = {Zj)j=i,...j>- Then, the 
conditional distributions of 

p 
x s k = V bkjZj , 1 < k < m, 

i=i 

given X = x can be readily obtained, for any given bk,j's. In this paper, we 
assume that the model is completely known, i.e., the parameters {ajj} and 
{bkj} are given. The statistical inference for these parameters is beyond the 
scope of this paper. 

Observe that if X = x, then ([T|) implies natural equality and inequality 
constraints on the Zj's. More precisely, (pQ) gives rise to a set of so-called 
hitting scenarios. In each hitting scenario, a subset of the Zj 's equal, in other 
words hit, their upper bounds and the rest of the Zj's can take arbitrary 
values in certain open intervals. We will show that the regular conditional 
probability of Z | X = x is a weighted mixture of the various distributions 
of the vector Z, under all possible hitting scenarios corresponding to X = x. 

The resulting formula, however, involves determining all hitting scenar- 
ios, which becomes computationally prohibitive for large and even moderate 
values of p. This issue is closely related to the NP-hard set-covering problem 
in computer science (see e.g. [3]). 

Fortunately, further detailed analysis of the probabilistic structure of 
the max-linear models allows us to obtain a different formula of the regular 
conditional probability (Theorem[2]). It yields an exact and computationally 
efficient algorithm, which in practice can handle complex max-linear models 
with p in the order of thousands, on a conventional desktop computer. The 
algorithm is implemented in the R ([18] ) package maxLinear [27] . with the 



core part written in C/C++. We also used the R package fields ([H]) to 
generate some of the figures in this paper. 

We illustrate the performance of our algorithm over two classes of pro- 
cesses: the max-autoregressive moving average (MARMA) time series (Davis 
and Resnick [B]), and the Smith model (Smith [23]) for spatial extremes. 
The MARMA processes are spectrally discrete max-stable processes, and 
our algorithm applies directly. In Section 13.11 we demonstrate the predic- 
tion of MARMA processes by conditional sampling and compare our result 
to the projection predictors proposed in [6]. To apply our algorithm to the 
Smith model, on the other hand, we first need to discretize the (spectrally 
continuous) model. Section 13.21 is devoted to conditional sampling for the 
discretized Smith model. Thanks to the computational efficiency of our 
algorithm, we can choose a mesh fine enough to obtain a satisfactory dis- 
cretization. Figure Q] shows four realizations from such a discretized Smith 
model, conditioning only on 7 observations (with assumed value 5). The 
algorithm applies in the same way to more complex models. 

1.2 Multivariate Max-Stable Distributions: a Brief Review 

Consider a general max-stable process X = {X t }t£T, indexed by a set T 
(e.g. T = [0,1], K,K or Z ). We shall assume that the finite-dimensional 
distributions of X are known and the ultimate goal is to study the condi- 
tional distributions of X. For convenience and without loss of generality, 
we focus on max-stable processes X with a-Frechet marginals (a > 0), such 
that all max-linear combinations 

n 

£ = max ctjXtj = \J ajX tj , a,j > 0, tj G T, 

have the a-Frechet distribution: 

P(f <x) = exp{-af x~ a }, x G (0, oo), 

with scale coefficient o^ > 0. Any max-stable process can be related to 
such an a-Frechet process by simple transformation of the marginals (see 
e.g. (El). 

Essentially all max-stable processes {Xt}teT admit the following ex- 
tremal integral representation: 

{X t } teT ±{J f t (s)M a (ds)} t& , (2) 



Conditional sampling from the Smith model 
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Parameters:p=0,p,=1,p 2 =1 



Figure 1: Four samples from the conditional distribution of the discrete 
Smith model (see Section ET2j) . given the observed values (all equal to 5) at 
the locations marked by crosses. 



where the /t's are non- negative, measurable deterministic functions defined 
on a suitable measure space (£,//) and such that f s f"(s)/J,(ds) < oo. Here 
M a is an a-Frechet random sup-measure with control measure /i. The ex- 
tremal integral J s fdM a can be defined for all / G L a (S,fi), f > 0, as 
the limit in probability of extremal integrals of simple functions. For more 
details, see |26| and the seminal work |10| for an alternative treatment. 

The functions {ft}t<=T are called the spectral functions of the process 
{X t }teT- They determine the finite-dimensional distributions of {X t }t&T'- 

„ n 

F{X tl < Xl , ■■■ ,X tn < x n ) = exp{ - / (y f u (s)/x t yfi(ds)} , 



for all ti £ T, Xi € R + := (0,oo), i = 1, ...,n. A popular equivalent 
representation of multivariate max-stable laws is as follows: 



^{X tl < xi, ■ ■ ■ ,X tn <x n )= exp 



{-[JV^/^V)}. 



Here S™" 1 = {w = (^)f =1 € M n : < Wj < max i= i r .. in w 8 = 1} is the 
positive unit sphere in the sup-norm, and T is a unique finite measure on 
§™~~ called the spectral measure of the distribution (see e.g. p^9| fTTj). 

Any multivariate max-stable vector (Xt )? =1 can be approximated arbi- 
trarily well in probability, by discretizing the extremal integral: 

ep P 

X u = / f u (s)M a (ds)^\Ja hJ Z J , 
J s ill 

where Zj, j = l,...,p are independent standard a-Frechet variables and 
a i,j ^ 0. This is equivalent to considering multivariate max-stable vectors 
with discrete spectral measures concentrated on at most p points on the 
unit sphere §"~ . The error of approximation, moreover, can be controlled 
explicitly through convenient probability metrics (see e.g. |26j). 

In this paper, we shall focus on the class of max-stable processes: 

v 
X t := \f fyWj, t E T, 

i=i 

where the ^(i)'s are non-negative deterministic functions. These processes 
are called spectrally discrete, since their spectral measures V are discrete. By 
taking sufficiently large p's and with judicious 0j(t)'s, one can build flexible 
models that can replicate the behavior of an arbitrary max-stable process. 
From this point of view, a satisfactory computational solution must be able 
to deal with max-linear models with large p's. 

The treatment of the exact conditional distributions of general spectrally 
continuous max-stable processes requires different tools and still remains an 
open problem, to the best of our knowledge. As we shall see, the solution 
in the discrete case, although complete, is already quite involved. 

Acknowledgments. The authors were partially supported by NSF 
grant DMS-0806094 at the University of Michigan. 



2 Conditional Probability in Max-Linear Models 

2.1 Intuition and Basic Theory 

Consider the max- linear model in ([1]). We shall denote this model by: 

X = A0Z, (3) 

where A = (oij) n xp is a matrix with non-negative entries, X = (X\, . . . , X n ) 
and Z = (Zi, . . . , Z p ) are column vectors. We assume that the Zj's, j = 
1, . . . ,p, are independent non-negative random variables having probability 
densities. 

In this section, we provide an explicit formula for the regular condi- 
tional probability of Z with respect to X (see Theorem Q] below and the 
Appendix for a precise definition). We start with some intuition and nota- 
tion. Throughout this paper, we assume that the matrix A has at least one 
nonzero entry in each of its rows and columns. This will be referred to as 
Assumption A. 

Observe that if x = A z with xGR^, z G M^_, then 

< Zj < % = Zj(A,x) := min Xi/du, j = 1, . . . ,p. (4) 

l<i<n 

That is, the max-linear model ([3]) imposes certain inequality and equality 
constraints on the Zj's, given a set of observed X^s. Namely, some of the 
upper bounds Zj(A, x) in (j4|) must be attained, or hit, i.e., Zj = Zj(A, x) in 
such a way that 

•&i ^i,j(i)^j(i)^ ^ J-, ■ ■ ■ ,Tl, 

with judicious j(i) € {1, . . . ,p}. The next example helps to understand the 
inequality and equality constraints. 

Example 1 . Suppose that n = p = 3 and 




Let x = i0z for some z £ Rj_. In this case, it necessarily follows that 
x\ < X2 < 0:3. Moreover, ^ yields z = x. 

(%) If x= (1,2,3), then it trivially follows that z = z = (1,2,3), which is 
an equality constraint on z. 



(ii) If x = (1,1,3), then it follows that z\ = Z\ = 1, z<i < z 2 = 1 and 
Z3 = S3 = 3. Here, the "equality constraints" must hold for Z\ = Z\ 
and z% = Z3, w/iiZe Z2 on/y needs to satisfy the "inequality constraint" 
0<z 2 < %. 

Write 

C(A, x):={z£R p + :x=i0 z}, 

and note that the conditional distribution of Z | X = x concentrates on 
the set C(A,x). The observation in Example [1] can be generalized and 
formulated as follows. 

• Every z G C(A, x) corresponds to a set of active (equality) constraints 
Jc{l,...,p}, which we refer to as a hitting scenario of (^4,x), such 
that 

Zj = Zj(A,x), j G J and Zj < Zj(A,x), j G J c := {1, . . . ,p}\ J. (5) 

Observe that if j J, then there are no further constraints and Zj 
can take any value in [0,Zj), regardless of the values of the other 
components of the vector z G C(A, x). 

• Every value x may give rise to many different hitting scenarios J C 
{1, . . . ,p}. Let J{A,x) denote the collection of all such J's. We refer 
to J7"(j4,x) as to the hitting distribution of x w.r.t. A: 

J(A,x) = |j C {1,... ,p} : exist z G C(^,x),such that © holds j. 

To illustrate the notions of hitting scenario and hitting distribution, consider 
again Example [TJ Therein, we have J{A,x) = {{1,2,3}} in case (i), and 
J(A, x) = {{1, 3}, {1, 2, 3}} in case (ii). 

The hitting distribution J~(A,x) is a finite set and thus can always be 
identified. However, the identification procedure is the key difficulty in pro- 
viding an efficient algorithm for conditional sampling in practice. This issue 
is addressed in Section I2T21 In the rest of this section, suppose that J '(A, x) 
is given. Then, we can partition C(A, x) as follows 

C(A,x)= (J Cj(A,x), 

where 

Cj(A,x) = {z G IR^ : Zj = Zj, j G J and Zj < Zj, j J}. 



The sets Cj(A, x), J G J {A, x) are disjoint since they correspond to differ- 
ent hitting scenarios in J(A,x). Let 

r(J(A,x)) = min \J\ , (6) 

Jej(A,x) 

where \J\ is the number of elements in J. We call r{J{A,x)) the rank of 
the hitting distribution ^T(A, x). It equals the minimal number of equality 
constraints among the hitting scenarios in J~(A,x). It will turn out that 
the hitting scenarios J C J~(A,x) with \J\ > r(J'(A,x)) occur with (condi- 
tional) probability zero and can be ignored. We therefore focus on the set 
of all relevant hitting scenarios: 

J r (A,x) = {J € J(A,x) : \J\ = r(J(A,x))}. 

Theorem 1. Consider the max-linear model in (0), where Zj's are inde- 
pendent random variables with densities fz and distribution functions Fz-, 
j = l,...,p. Let A = (aij) n xp have non-negative entries satisfying As- 
sumption A and let 1Z r p be the class of all rectangles {(e, f], e, f G R+} in 

R p + . 

For all J € J(A,x), E € TZ R p , and x e R£, define 

uj(x,E) := H^ME)) II P ^ G n iW I Z i < %>> ( 7 ) 

jeJ jeJ c 

where ttj(zi, . . . , z p ) = Zj and 5 a is a unit point-mass at a. 

Then, the regular conditional probability u(x,E) of Z w.r.t. X equals: 

v{x,E)= J2 Pj(A,x)i/j(x,£7), E£K K , (8) 

Jej r (A,x) 

for F x -almost all x £ AQ (M+), where for all J G J r (A, x), 
Pj{A,x) = ^ with wj = TlzjfzjCzj) ]~[ F Zj (zj). (9) 

l^KeJr(A^) W K jGj j&JC 

In the special case when the Zj's are a-Prechet with scale coefficient 1, we 
have wj = ]1 jeJ (%)""• 

Remark 1. We state |2P only for rectangle sets E because the projections 
ttj(B) of an arbitrary Borel set B C K+ are not always Borel (see e.g. 125]). 
Nevertheless, the extension of measure theorem ensures that Formula |2P 
specifies completely the regular conditional probability. 



We do not provide a proof of Theorem Q] directly. Instead, we will first 
provide an equivalent formula for i/(x,E) in Theorem [2] in Section [2.21 an d 
then prove that z^(x, E) is the desired regular conditional probability. All 
the proofs are deferred to Section UJ The next example gives the intuition 
behind Formula ([S]). 

Example 2. Continue with Example^ 

(i) IfX. = x = (1,2,3), then z = x, J(A,x.) = {{1,2,3}}. Therefore, 
r(J~(A,x)) = 3 and Formula f#|) yields 

v(x,E) = uM,E) = ^ 1 (7r 1 (£))%(7r 2 (S))% 3 (7r3(£:)) = <5 2 (£) , 

a degenerate distribution with single unit point mass at z. 

(ii) IfX = x = (1,1,3), then, z = x, J"(Ax) = {{1, 3}, {1, 2, 3}}, and 
r(J(A,x)) = 2. Therefore, J r (A,x) = {{1,3}} and Formula (0) 
yields: 

v(x.,E) = I/ { i l3 }(x,E) = %(7Tl(E))P(Z 2 G 7T 2 (£) | Z 2 < 2 2 )% 3 (7r 3 (^)). 

In i/izs case, i/ie conditional distribution concentrates on the one- 
dimensional set {1} x (0, 1) x {3}. 

(Hi) Finally, if X = x = (1,1,1), then z = x and ^(^x) = 
{{1}, {1,2}, {1,2,3}}. Iften, J r (A,x) = {{1}} and 

3 

Kx,£) = v {1} (x,E!) = S&M^llFiZj G 7r,-(£0 I Zj < %)• 

i=2 

T/ie conditional distribution concentrates on the set {1} x (0, 1) x (0, 1). 

We conclude this section by showing that the conditional distribu- 
tions ([8|) arise as suitable limits. This result can be viewed as a heuristic 
justification of Theorem [TJ Let e > 0, consider 

C e j(A,x) : = {z£R p + : Zj G [%(l-e),%(l+e)], j £j,z k < z k (l-e),k€ J c }, 

(10) 
and set 

C e (Ax):= |J C5(A,x). (11) 

Note that the sets A (C e (A,x)) shrink to the point x, as e | 0. 

10 



Proposition 1. Under the assumptions of Theorem^ for allx G AQ(^ + j, 
we have, as e 10, 

P(Z G E | Z G C e (A, x)) — ► i/(x, £), EeTZ R P. (12) 

Proof. Recall the definition of Cj in (jlOp . Observe that for all e > 0, the 
sets {C'j(A x )}jej'( J 4,x) are mutually disjoint. Thus, writing C e = C e (A, x) 
and C} = C e j(A,x), by (dU we have 

F(Z£ E\ZeC e ) = ^P(Ze£|ZG C})F(Z G C} I Z G c e ) 

JeJ 

where the terms with P(Z G C}) = are ignored. One can see that P(Z G 
E | Z G Cj) converge to i/j(E,x) in flTJ), as e | 0. The independence of the 
Zj's also implies that 

P(Z G C}) = YlnZj G [%(1 - e),%(l + 6)]) [] P(Z fe < %{l - e)) 
je,/ fceJ c 

= n (/^(%)% • 2e + °( £ )) n ( f ^(%) + °( £ )) • ( u ) 

jeJ fceJ c 

Observe that for J G Jr(A,x), the latter expression equals 2wje' J '(l + 
o(l)), e 4- and the terms with \J\ > r will become negligible since they are 
of smaller order. Therefore, Relation (f!H|) yields (jSJ), and the proof is thus 
complete. □ 

The proof of Proposition Q] provides an insight to the expressions of the 
weights wj's in © and the components vj's in (|7|). In particular, it explains 
why only hitting scenarios of rank r are involved in the expression of the 
conditional probability. The formal proof of Theorem Q] however, requires 
a different argument. 

2.2 Conditional Sampling: Computational Efficiency 

We discuss here important computational issues related to sampling from the 
regular conditional probability in (J8|) . It turns out that identifying all hitting 
scenarios amounts to solving the set covering problem, which is NP-hard (see 
e.g. [3]). The probabilistic structure of the max-linear models, however, will 



11 



lead us to an alternative efficient solution, valid with probability one. In par- 
ticular, we will provide a new formula for the regular conditional probability, 
showing that Z can be decomposed into conditionally independent vectors, 
given X = x. As a consequence, with probability one we are not in the 'bad' 
situation that the corresponding set covering problem requires exponential 
time to solve. Indeed, this will lead us to an efficient and linearly-scalable 
algorithm for conditional sampling, which works well for max-linear models 
with large dimensions n x p arising in applications. 

To fix ideas, observe that Theorem Q] implies the following simple algo- 
rithm. 

Algorithm I: 

1. Compute £j for j = 1, . . . ,p. 

2. Identify J~(A, x), compute r = r(J'(A, x)) and focus on the set of 
relevant hitting scenarios J r = J r {A, x). 

3. Compute {wj} j£jr and {pj}j e j r - 

4. Sample Z ~ z^(x, •) according to ([8]). 

Step [Q is immediate. Provided that Step [2] is done, Step [3] is trivial and, 
Step H] can be carried out by first picking a hitting scenario J G J r {A, x) 
(with probability pj(A,x)), setting Zj = Zj, for j £ J and then resampling 
independently the remaining Zj's from the truncated distributions: Zj \ 
{Zj <%}, for all j G {l,...,p}\ J. 

The most computationally intensive aspect of this algorithm is to identify 
the set of all relevant hitting scenarios ^T r (A, x) in Step [2j This is closely 
related to the NP-hard set covering problem in theoretical computer science 
(see e.g. [3]), which is formulated next. Let H = (hij) nxp be a matrix of 
O's and l's, and let c = {cj) v - =1 € 71?+ be a p-dimensional cost vector. For 
simplicity, introduce the notation: 

(m) = {1,2, ... ,m}, m G N. 

For the matrix H, we say that the column j G (p) covers the row i G (n), 
if hi j = 1. The goal of the set-covering problem is to find a minimum-cost 
subset J C (p), such that every row is covered by at least one column j G J. 
This is equivalent to solving 



mm 






Cj5j , subject to \, hij$j ^ 1 1* £ ( n ) ■ (15) 
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We can relate the problem of identifying J r {A, x) to the set covering problem 
by defining 

Ki = 1 {a lJ z J =x l }, (16) 

where A = (aij) nxp and x = (xj)™ =1 are as in ([3]), and Cj = l,j 6 (p). It is 
easy to see that, every J € J r (A,x) corresponds to a solution of (fT5|) . and 
vice versa. Namely, for {5j}j^u,\ minimizing (fT5j) . we have J = {j G (p) : 
6 j = l}eJr(A,x). 

The set j7" r (A,x) corresponds to the set of all solutions of (fT5]h which 
depends only on the matrix H. Therefore, in the sequel we write J r {H) for 
J r {A,-x), and 

H={hij) nxp =m{A,x), (17) 

with hi j as in (|16p will be referred to as the hitting matrix. 

Example 3. Recall Example The following hitting matrices correspond 
to the three cases of x discussed therein: 

10 

ffW = f o 1 I , F (ii) 
1 

Observe that solving for J T {H} is even more challenging than solving the 
set covering problem (1151) . where only one minimum-cost subset J is needed, 
and often an approximation of the optimal solution is acceptable. Here, we 
need to identify exhaustively all J's such that (I15p holds. Fortunately, this 
problem can be substantially simplified, thanks to the probabilistic structure 
of the max-linear model. 

We first study the distribution of H. In view of (fTT|) . we have that 

H = W(A, X), with X = A Z, is a random matrix. It will turn out 

that, with probability one, H has a nice structure, leading to an efficient 

conditional sampling algorithm. 

For any hitting matrix H, we will decompose the set (p) = {1, . . . ,p} into 

f s \ 

a certain disjoint union (p) = Us=i ^ • ^ ne vec tors (Zj) . _(„) , s = 1, . . . , r 

will turn out to be conditionally independent (in s), given X = x. Therefore, 
i/(x, E) will be expressed as a product of (conditional) probabilities. 

We start by decomposing the set (n) = {1, ... , n}. First, for all ii,i2 € 

(n) ,j € (p), we write ii ~ «2 j ^ hi 1 .j = hi 2 j = 1. Then, we define an 
equivalence relation on (n): 

&1 ~ l 2 , if Zl = lQ ~ n ~ • • • ~ «m = 12 , (18) 
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with some m < n,ii = io, h , ■ ■ ■ , i m = *2 € (p) ,ji,---,j m € (p) ■ That is, 

j 

~ is the transitive closure of '~'. Consequently, we obtain a partition of 
(n), denoted by 



(n) = (J I s , (19) 



where I s , s = 1, . . . , r are the equivalence classes w.r.t. (fl8j) . Based on ([19]) , 
we define further 

J (s) = {j € <p) : fej = 1 for alU G J s } , (20) 

J = jj € (p) : h iyj = 1 for some i G I s \ . (21) 

The sets {JW ; J } se / r \ will determine the factorization form of v(x.,E). 

Theorem 2. Lei Z 6e as in Theorem d Lei afeo H be the hitting matrix 

corresponding to (A, X) u>i£/i X = A Z ; and {</'% J }se(r) ^ e the sets 
defined in i20\) and \21}) . Then, with probability one, we have 



(i) r = r{J(A,X)), 

(ii) for all J C (p), J € J r {A, AQZ) if and only if J can be written as 
J = {jl,...,j r } with j s eJ^,se(r), (22) 

(Hi) for v(x, E) defined in |2[), 

v(X,E) = f[u(°\X,E) with „M(X,i5) = Zjeji^HMH^m 
s =i *Ljej(s)Wj ( x ) 

(23) 
where for all j G J^ , 

u>«(x) := %/z,-(%) II F ^k), (24) 



fcGJ (s) \{j} 



0) 



f^E) := <5 %(E) (%) H nZ k en k (E)\Z k <z k ), (25) 



fceJ (s) \{j} 



wrat/i £,- = %(x) as in (0). 
The proof of Theorem [2] is given in Section 01 
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Remark 2. Note that this result does not claim that i/(x,E) in Ii23\) is the 
regular conditional probability. It merely provides an equivalent expression 
for fiB\), which is valid with probability one. We still need to show that {3p, 
or equivalently i23\) . is indeed the regular conditional probability. 

From (|24p and (|25p . one can see that v^ s ' is the conditional distribution of 
(Zj) . ~(s). Therefore, Relation (f23l) implies that {(Zj) . -r0)} s6 ( r ), as vectors 
indexed by s, are conditionally independent, given X = x. This leads to the 
following improved conditional sampling algorithm: 

Algorithm II: 

1. Compute Zj for j = 1, . . . ,p and the hitting matrix H = M(A, x). 

2. Identify {jW, J {s) } se(r ) by (J2DJ1 and (|2Tj). 



rsG(r) 



3. Compute {w- s) } 7 - eJ(s) for all s G (r) by (|23jl. 



4. Sample (Zj) . _( s ) | X = x ~ i/' s )(x, •) independently for s = 1, 



jeJ 



, r. 



5. Combine the sampled (Zj) . -( s) , s = 1, . . . , r to obtain a sample Z. 

This algorithm identifies all hitting scenarios in an efficient way. To 
illustrate its efficiency compared to Algorithm I, consider that r = 10 and 
|j( s )| = 10 for all s G (10). Then, applying Formula ([8]) in Algorithm 
I requires storing in memory the weights of all 10 10 hitting scenarios. In 
contrast, the implementation of (j23[) requires saving only 10 x 10 weights. 
This improvement is critical in practice since it allows us to handle large, 
realistic models. 

Table [1] demonstrates the running times of Algorithm II as a function 
of the dimensions n x p of the matrix A. It is based on a discretized 2-d 
Smith model (Section 13. 2p and measured on an Intel (R) Core(TM)2 Duo 
CPU E4400 2.00GHz with 2GB RAM. It is remarkable that the times scale 
linearly in both n and p. 

3 Examples 

3.1 MARMA processes 

In this section, we apply our result to the max-autoregressive moving average 
(MARMA) processes studied by Davis and Resnick [6]. A stationary process 
{Xt}tez is a MARMA(m, q) process if it satisfies the MARMA recursion: 

X t = faXt-i V • • • V 4> m X t -m V Z t V 0iZ t _i V • • • V 9 q Z t . q , (26) 
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Table 1: Means and standard deviations (in parentheses) of the running 
times (in seconds) for the decomposition of the hitting matrix H, based on 
100 independent observations X = A Z, where A is an (n x p) matrix 
corresponding to a discretized Smith model. 



p\n 



2500 
10000 



1 5 10 50 



0.03 (0.02) 0.13 (0.03) 0.24 (0.04) 1.25 (0.09) 
0.11 (0.04) 0.50 (0.05) 1.00 (0.08) 4.98 (0.33) 



for all t E Z, where fa > 0, 9j > 0,i = l,...,m,j = l,...,q are the 
parameters, and {Zt}t^z are i.i.d. 1-Frechet random variables. Proposition 
2.2 in [6] shows that, (|26p has a unique solution in form of 

oo 

Xt = \J tpjZt-j < oo , almost surely, (27) 

j=0 

with ipj > 0, j > 0, J27Lo Tpj < °°5 if an d only if 4>* = V2=i 0i < 1- In this 
case, 

fc=0 

where {a.j}j & i are determined recursively by ay = for all j < 0, ao = 1 
and 

Oij = <f>i(Xj-\ V 4>2Uj-2 V • • • V </> m aj_ m , Vj > 1 . (28) 

In the sequel, we will focus on the MARMA process (|26p with unique station- 
ary solution (j27|) . In this case, the MARMA process is a spectrally discrete 
max-stable process. Without loss of generality, we also assume {Zk}kez to 
be standard 1-Frechet. 

We consider the prediction of the MARMA process in the following 
framework: suppose at each time t E {1, . . . , n} we observe the value Xt of 
the process, and the goal is to predict {X s } n<s < n+ N. We do so by generating 
i.i.d. samples from the conditional distribution {X s } n<s < n+ N \ {Xt}t=\,...,n- 
To apply our result, it suffices to provide a max-linear representation of this 
model. We will truncate (l27l) to obtain 



X t =\/'<P j Zt- j ,Vt = l,...,n + N. (29) 
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The truncated process can approximate the original one arbitrarily well, if 
we take p large enough. Indeed, by using the independence and max-stability 
of the Zt's, one can show that 



3=0 i=p+i 



1 



ET=o^j 



i, 



(30) 
as p — > oo. Moreover, by induction on a.j in (|28p . one can show that a.j < 
(^)*y3l m \ f or a u j £ pj ; anc i thus the convergence (|30p above is geometrically 
fast. 

Now, we reformulate the prediction problem with the model (j29[) as 
follows: 

observe Xn n i = yl Z, and predict Yrtjyi = B Z | X[ ln ] , 

with the notation Xn n i = (Xi, 

Z = (Zi_ p , Z 2 _ p , . . . , Z n+N ). Here, ^4 G 
determined by (1291) . In particular, 



.4 





V o 








>-Xn), Y[ 1)A r] = (X n+ \,... ,X u+ n) and 


^o ■■■ \ 

••• Vo o ••• 






• (31) 


ipp ip p -i ■■■ Vo / 





In practice, given the observations Xn n i, we use our algorithm to sample 
from the conditional distribution Z | Xr ln i. Therefore, we can sample 



Y[i,jv] | X[i >n ] 



B Z | X [lin] . (32) 

Our approach is different from the prediction considered in [6] , which we 
will briefly review. Davis and Resnick took the classic time series point of 
view and investigated how to approximate X s by a max-linear combination 
of {Xt}t=i,...,ni w.r.t. a certain metric d. Namely, for all Y £ "H with 



H 



oo oo 



they considered a projection of Y onto the space J- n , max- linearly spanned 
by {X t }t=i,...,n- Fn = {\Jf =0 bjX n ^j : bj > 0,Y^f =0 b j < oo}- That is, 
consider the projection V n Y defined by 



V n Y = argmin^^ d(Y , Y) 
17 



(33) 



with the metric d induced by d(Vj a j^ji V ' j PjZj) = Ylj \ a j ~ Pj\- F° r s P e_ 
cific MARMA processes, [6] provided predictors based on the projection ([33]) . 
We will refer to these predictors as the projection predictors. 

In general, the conditional samplings reflect the conditional distribu- 
tion (|32p . and they provide more information than the projection predictors. 
Sampling multiple times from (|32p . we can calculate e.g., conditional me- 
dians, conditional means, quantiles, etc., which are optimal predictors with 
respect to various loss functions. 

Example 4 (MAR(m) processes). Consider the MAR(m) =MARMA(m,0) 
process with 

X t = <£iX t _i V • • • V (j} m X t -m V Z t . (34) 

The projection predictor for this model can be obtained recursively by 

Xt+k = 4>iXt+k-i v • • • V 4> m X t +k-m , (35) 

with X t = X t ,t = l,...,n (see [6], p. 799). 

Figure [2] illustrates an application of our conditional sampling algorithm 
in this case. Consider an MAR(3) process {Xt}^ with (f>\ = 0.7, 4>2 = 0.5 
and 03 = 0.3. In effect, we use the truncated model {X t } tG pj in (|29l) with 
p = 500, but we still write Xt for the sake of simplicity. Treating the 
first 100 values as observed, we plot the projection predictor, conditional 
upper 95%-quantiles and the conditional medians of {X s }^° 101 based on 
500 independent samples from the conditional distribution. 

Observe that the value of the projection predictor in Figure [2] is always 
below the conditional median. This "underestimation" phenomenon was 
typical in all the simulations we performed. It can be explained by the fact 
that, the projection predictor in (j35[) does not account for the jumps of the 
process caused by new arrivals {^t}t>ioo- Indeed, a large new arrival Zt will 
cause the process to jump immediately to Z t at time t, but this will never 
occur for the projection predictor Xt- 

Next, we apply our algorithm to examine the bias of the projection 
predictor. To do this, for each generated MARMA process, we calculated 
the cumulative probability that the projection predictor corresponds to, for 
each location s = 101, . . . , 150. Namely, using 500 independent samples 
{Xs }^° 101 , k = 1, . . . , 500 from the conditional distribution, we calculated 



1 500 
P(A S < X s | {Xt}™!) « — -£>{*i fc) < X s },Vs > 100, (36) 



fc=i 
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Prediction of MARMA(3,0) processes 



MARMA process 
Conditional 95% quantile 
Conditional median 
Projection predictor 




~r~ 

50 



100 



150 



Figure 2: Prediction of a MARMA(3,0) process with <j)\ = 0.7, 4>2 = 0.5 and 
03 = 0.3, based on the observation of the first 100 values of the process. 



where X s is the projection predictor in (|35|) . This procedure was repeated 
1000 times for independent realizations of {X t }j£° and the means of the 
(estimated) probability in (|36p are reported in Table [2j Note that as the 
time lag increases, the conditional quantiles of the projection predictors 
decrease. In this way, our conditional sampling algorithm helps quantify 
numerically the observed underestimation phenomenon in Figure [2j 

Finally, we compare the generated conditional samples to the true pro- 
cess values at times s = 101, . . . , 150. Our goal is to demonstrate the validity 
of our conditional sampling algorithm. The idea is that, at each location 
s = 101, . . . , 150, the true process should lie below the predicted 95% upper 
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Table 2: Cumulative probabilities that the projection predictors correspond 
to at time 100 + 1, based on 1000 simulations. 



t 


1 


2 


3 


4 


5 


10 


20 


30 


40 


mean 


70.6% 


50.3% 


35.6% 


25.3% 


17.8% 


2.9% 


0.1% 


0% 


0% 



confidence bound of X s | {Xj}J£° , with probability at least 95%. (Note that 
due to the presence of atoms in the conditional distributions, the coverage 
probability may in principle be higher than 95%.) Motivated by this, we 
repeat the procedure in the previous paragraph and record the proportion 
of the times that X s is below the predicted confidence quantile, for each s. 
We refer to these values as the coverage rates. As discussed, the coverage 
rates should be close to 95%. This is supported by our simulation result, 
shown in Table O 

Table 3: Coverage rates (CR) and the widths of the upper 95% confidence 
intervals at time 100 + t, based on 1000 simulations. 



t 


1 


2 


3 


4 


5 


10 


20 


30 


40 


CR 

width 


0.956 
13.06 


0.952 
26.6 


0.954 
37.8 


0.957 
45.6 


0.966 
51.2 


0.947 
62.8 


0.943 
66.0 


0.951 
66.2 


0.955 
65.4 



Table [3] also shows the widths of the upper 95%-confidence intervals. 
Note that these widths are not equal to the upper confidence bounds, given 
by the conditional 95%-quantiles, since the left end-point of the conditional 
distributions are greater than zero. When the time lag is small, the left end- 
point is large and the widths are small, due to the strong influence of the past 
of the process {-X"t}J£i. On the other hand, because of the weak temporal 
dependence of the MAR(3) processes, this influence decreases fast as the lags 
increase. Consequently, the conditional distribution converges to the uncon- 
ditional one, and the conditional quantile to the unconditional one. Note 
that the (unconditional) 95%-quantile of X s for the MARMA process (|27p 
can be calculated via the formula 0.95 = F(o~Z < u) = exp(— au~ l ), with 
a = X^o^Ar F° r ^ ne MAR(3) process we chose, we have a = 3.4 and the 
95%-quantile of X s equals 66.29. This is consistent with the widths in Table 
[3] for large lags. 

Remark 3. As pointed out by an anonymous referee, in this case one can 
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directly generate samples from {X s }g =n+1 | {JQ}™ =1; by generating indepen- 
dent Frechet random variables and iterating \34\ ). We selected this example 
only for illustrative purpose and to be able to compare with the projection 
predictors in [6]. One can modify slightly the prediction problem, such that 
our algorithm still applies by adjusting accordingly \31\) . while both the pro- 
jection predictor and the direct method by using j3^| ) do not apply. For 
example, consider the prediction problem with respect to the conditional dis- 
tribution ^{{X s } 2 ^ +1 e ■ | {X t : t = l,3,...,2n- 1}) (prediction with 
only partial history observed) or P({X S }™~ 2 € • | X\,X n ) (prediction of the 
middle path with the beginning and the end-point (in the future) given). In 
other words, our algorithm has no restriction on the locations of observa- 
tions. This feature is of great importance in spatial prediction problems. 

3.2 The Discrete Smith Model 

Consider the following moving maxima random field model in M. 2 : 

X t = [ <f>(t - u)M a (du), t = (t 1 ,t 2 )GM 2 , (37) 

where M a is an a-Frechet random sup-measure on M 2 with the Lebesgue 
control measure. Smith [21] proposed to use for (p the bivariate Gaussian 
density: 

#*!, t 2 ) ■= - Pl J^ exp { - 1 [(3lt\ - 2p0 1 p 2 t 1 t2 + $4] }, 

27iy l - p L 2 ( 1 ~ p) J 

(38) 
with correlation p S (—1,1) and variances af = 1//3 2 , i = 1,2. Consis- 
tent and asymptotically normal estimators for the parameters p, /3\ and 02 
were obtained by de Haan and Pereira [12]. Here, we will assume that these 
parameters are known and will illustrate the conditional sampling methodol- 
ogy over a discretized version of the random field (j37|) . Namely, we truncate 
the extremal integral in (I37p to the square region [— M, M] 2 and consider a 
uniform mesh of size h := M/q, gGN. We then set 

X t := y h 2 / a (f>{t-n jlJ2 )Z jlJ2 , (39) 

-9<jij2<g-l 

where u jlh = ((ji + l/2)h,(j 2 + l/2)/») and h 2 ' a Z hJ2 = M a ({jih,{ji + 
l)h] x (J2h, (J2 + 1)/*])- This discretized model ([39]) can be made arbitrarily 
close to the spectrally continuous one in (j37|) by taking a fine mesh h and 
sufficiently large M (see e.g. [26]). 
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Suppose that the random field X in (j39j) is observed at n locations Xt 



, we have the max-linear 
V j—x, p = q 2 ■ By sampling 
we can predict the random 



x h tj € [-M, M] 2 , % = 1, . . . , n. In view of ([551 
model X = A©Z, with X = (X t J™ =1 and Z = (Z, 
from the conditional distribution of Z | X = x 
field X s at arbitrary locations s£R 2 . 

To illustrate our algorithm, we used the model (|39[) with parameter val- 
ues p = 0, Pi = /?2 = 1, M = A,p = q 2 = 2500, and n = 7 observed locations. 
We generated N = 500 independent samples from the conditional distribu- 
tion of the random field {^ s }> where s takes values on an uniform 100 x 100 
grid, in the region [—2, 2] x [—2, 2]. We have already seen four of these real- 
izations in Figured! Figure [3] illustrates the median and 0.95-th quantile of 
the conditional distribution. The former provides the optimal predictor for 
the values of the random field given the observed data, with respect to the 
absolute deviation loss. The marginal quantiles, on the other hand, provide 
important confidence regions for the random field, given the data. 

Certainly, conditional sampling may be used to address more com- 
plex functional prediction problems. In particular, given a two-dimensional 
threshold surface, one can readily obtain the correct probability that the ran- 
dom field exceeds or stays below this surface, conditionally on the observed 
values. This is much more than what marginal conditional distributions can 
provide. 



Conditional Median of the Smith model 



Conditional Marginal Quantile of the Smith model 





Parameters:p=0, p,=' 



Para mete rs:p=Q, p,=1, fi 2 ^1, q=Q.95 



Figure 3: Conditional medians (left) and 0.95-th conditional marginal quan- 
tiles (right). Each cross indicates an observed location of the random field, 
with the observed value at right. 
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4 Proofs of Theorems [T] and [2] 

In this section, we prove Theorems [JJ and [2j We will first prove Theorem [21 
which simplifies the regular conditional probability formula ([8]) in Theo- 
rem [TJ Then, we show the simplified new formula is the desired regular con- 
ditional probability, which completes the proof of Theorem [TJ The key step 
to prove Theorem [2] is the following lemma. Write H.j = {i G (r) : hi j = 1}. 

Lemma 1. Under the assumptions of Theorem^ with probability one, 

(i) j( s > is nonempty for all s £ (r), and 

(ii) for all j € J( s \ H.j [~l I s ^ implies H.j C I s . 

Proof. Note that to show part (ii) of Lemma [TJ it suffices to observe that 
since I s is an equivalence class w.r.t. Relation f 1 1 8 [) . H.j \ I s and H.j n I s 
cannot be both nonempty. Thus, it remains to show part (i). We proceed 
by excluding several P- measure zero sets, on which the desired results may 
not hold. 

First, observe that for all i € (n), the maximum value of {aijZj}j € / r \ is 
achieved for unique j € (p) with probability one, since the Zj's are indepen- 
dent and have continuous distributions. Thus, the set 

Mi := (J { Oi dl Z jl = a i>j2 Z j2 = max a id Zj } 

iG(n>j'iJ 2 6(p),ji^j2 

has P-measure zero. From now on, we focus on the event A/"f and set j(i) = 
argmax Jg /p\OijZj for all i € (n). 

Next, we show that with probability one, i\ ~ i% implies j(i\) = jfe)- 
That is, the set 

M 2 -= \J M j)il)i2 with Mj,i u i 2 := |i(«i) / j(i2),k ~ «2 j 

has P-measure 0. It suffices to show P(A/} i j li j 2 ) = for all i\ / %i. If 
not, since (p) and (n) are finite sets, there exists A/"o C Mj^ x j 2 , such that 
i(*i) = 3\ 7^ ife) = 32 on A/o, and P(Ao) > 0. At the same time, however, 

j 

observe that i\ ~ ?2 implies rtjjj = rtj 2 j = 1, which yields 

a ik,i Z i = X ik = a ik,j(ik)^j(ik) = a ik,ik^jk ) & = 1, 2 . 

It then follows that on Ao, Zj 1 /Zj 2 = ai 1 jai 2 j 2 /(ai 2 jai 1 j 1 ), which is a 
constant. This constant is strictly positive and finite. Indeed, this is because 
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on A/"f, a,ij(i) > by Assumption A and hij = 1 implies chj > 0. Since Zj x 
and Zj 2 are independent continuous random variables, it then follows that 
P(A/- ) = 0. 

Finally, we focus on the event (N\ U A/2) c . Then, for any i\,i2 € I s , we 
have i\ ~ %2 and let i , . . . ,i n be as in (fTSj) , It then follows that j(«i) = 
i(*o) = i(n) = • • • = j(i n ) = ife)- Note that for all i G (n), /^(j) = 1 by 
the definition of j(«). Hence, j(ii) = ife) G J^ 8 • We have thus completed 
the proof. □ 

Proof of TheoremlE Since {I s } s e(r) are disjoint with [Jse(r) ^ s = ( n )' m ^ e 
language of the set-covering problem, to cover (n), we need to cover each 
I s . By part (ii) of Lemma dj any two different I Sl and I S2 cannot be covered 
by a single set H.j. Thus we need at least r sets to cover (n). On the other 
hand, with probability one we can select one j s from each j( s > (by part 
(i) of Lemma [1]), which yields a valid cover. That is, with probability one, 
t = r(J(H)) and any valid minimum-cost cover of (n) must be as in (|22p . 
and vice versa. We have thus proved parts (i) and (ii). 

To show (iii), by straight-forward calculation, we have, with probability 
one, 

Y^ WJ = Yl ■ J2 W dl,-Jr 

j£j r (A,x) heJW j,.eJM 



E ■■■ E 

jlGjf 1 ) jrV-lGJ^" 1 ) 



r-1 






x { e (%/^(%) n **.&. 

i£J (r) keJ (r \{j} 

n e (%/*,&■) n **&)) =n e #°) 



s=ij e j(s) fceJ (s) \{j} s=lj e jM 



Similarly, we have 



53 «wk*,-e) = n ( E ^N s) (*,£)) • (4i) 



By plugging (|40j) and f4Tj) into ([8]) , we obtain the desired result and complete 
the proof. □ 
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Proof of Theorem [7J To prove that v in ([8|) yields the regular conditional 
probability of Z given X, it is enough to show that 



P(X eD,Ze£)= I ^(x,£)P x (dx), 
Jd 



(42) 



for all rectangles D £ 7^K n and -E/ € TZ^p . In view of Theorem [21 it is enough 
to work with z/(x, £?) given by (|23p . 

We shall prove ()42|) by breaking the integration into a suitable sum of 
integrals over regions corresponding to all hitting matrices H for the max- 
linear model X = A Q Z. We say such a hitting matrix H is nice, if j( s > 
defined in (|20p is nonempty for all s£ (r). In view of LemmaHJ it suffices to 
focus on the set H(A) of nice hitting matrices H. Notice that the set H(A) 
is finite since the elements of the hitting matrices are O's and l's. 

For all rectangles D £ IZ^n , let 

D H = jx = A z : U(A, x) = H, x € D j 

be the set of all x £ R™ that give rise to the hitting matrix H. By Lemma [1] 
(i), for the random vector X = A Z, with probability one, we have 

X= £ X1 Dh (X) 

HeH(A) 



and hence 



/ z;(x,£)P x (dx)= ^ / ^(x, J E)P x (dx). (43) 

^ D HeH(A) J Dh 

Now fix an arbitrary and non-random nice hitting matrix H £ %(j4). 
Let {I s } S £{r) denote the partition of (n) determined by (fTB|) and let 

j( s ), J (s) , a = l,...,r be as in ([20]). Recall that jW C J (s) and the 

— (s) 
sets J , s = 1, . . . ,r are disjoint. 

Focus on the set Dh C R™ . Without loss of generality, and for notational 
convenience, suppose that s £ I s , for all s = 1, . . . , r. That is, 

h = {Ml, 2, • • • )»l,fci}) ^2 = {2,i2,2, • • • , ^2,fc 2 }' "' > Ir = { r ,ir,2, ■ ■ ■ ,i r ,k r }- 

Define the projection mapping Vh '■ Dh — > R+ onto the first r coordi- 
nates: 

Ph(xi,... ,X n ) = (xi,...,X r ) =X r . 
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Note that Vh, restricted to Djj is one-to-one. Indeed, for all i G I s , we 
have Xi = o,i t jZj and x s = a s> j~Zj, for all j G j( s > (recall (|20p ). This implies 
%i = (cLij/a s j)x s , for all i G I s and all s = 1, . . . ,r. Hence, "P#(x) = 'Ph(x) 
implies x = x. 

Consequently, can write x = - P~ 1 (x r ), x r G V(Du), and 

/ i/(x,£)P x (dx) = / i/(x, J E)Q Xr (dxi...dx r ), 

JDj, JVh(Dh) 

where Q^ r := P x o Vjj is the induced measure on the set Vh(Dh)- 

Lemma 2. The measure Q H r has a density with respect to the Lebesgue 
measure on the set Vh{Dh)- The density is given by 

Q| r (dx r ) = 1 Vh{ d h) {^) n £ ^ S) (x)^ • • • ^. (44) 

The proof of this result is given below. In view of (|44p and (|23p . we obtain 

i/(x,£)Q^(dx r ) 

Ph(D h ) 

A, m .111 ^ — rfr^ — ) x ll ^ ^ (x) ^7'"T" 



JVh{D h ) s= i . f=7(s) 



which equals 






E / (D1 fW:Wi"(x,^-^. (45) 

= :/(jl,...J r ) 

Fix ji G J' , • • • , jV G J^ and focus on the integral I(ji, ■ ■ ■ ,j r )- Define 
^h( d h) ■= \(zji,---,Zj r ) ■ Zj s =x s /a sJa , s = l,...,r,x r = (x s ) r s=1 G V H {D H )Y 
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We have, by ([23]) . ([25]) . and replacing x s with a s j s Zj s , s = 1, ... ,r (simple 
change of variables), 



J(jl,--- ,ir) 

= / n(%w n ^( ? *) 



fceJ (s) \{j s } 



fce7 (s) \{i s } 



^(o fl ) s=1 

x [] P(Z fc E7r fc (S),Z fc <Sfc)d^i-"d^r- ( 46 ) 

*e(p)\{ji,...jr} 

Define 

^H;n,...,jri D H) = {zer + :x = i0zeD i f, 

z is = x s /a s j s ,s = l,...,r,z k < z k (x), k e (p) \ {ji, . . . , jV } | - 
By the independence of the Z&'s, ([36]) becomes 

i(ji, ••.,>) = p(z e n H . ju ... dr (D H ) n #) . (47) 

By plugging ([37]) into ([35]) . we obtain 






x,£)P x (dx) = I ^(x,^)Q Xr (dx r 



Y^ P(Ze% li ... Jr (Dfl)nB) = P(A0ZeD i j,Ze£), 

(48) 

because the summation over {jx, ■ ■ ■ ,jr) accounts for all relevant hitting 
scenarios corresponding to the matrix H. Plugging (j48|) into (|43|) . we have 

/ z;(x,£)P x (dx) = J^ PfXEiGZGD//, ZG£)=P(XGD,ZeE) 

This completes the proof of Theorem [TJ □ 
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Proof of Lemma^ Consider the random vector X r = (X\, . . . ,X r ). Ob- 
serve that by the definition of the set Vh(Dh), on the event {X r € 
Vh(Dh)}, we have 

( a hh z h \ r 
'■ II 1 ! V a s , k Z k < a sde Z js y 

ar, jr Z Jr J s=1 keJ (s) \{j s ) 



k. r 



(49) 

i s \ 

Note that since J(s) d J , s = 1, . . . , r, the events f\=i C s ,j s are disjoint 
for all r-tuples (ji, ...,>) & J^ x • • • x j( r ). 

Recall that our goal is to establish (I44D . By the fact that the sum in (I49D 
involves only one non-zero term for some (j\, . . . ,j r ), with probability one, 
we have that for all measurable set A C Vh(Dh), writing £j g = a s j 3 Zj a , 

Q^(A)EP(X r eA) 

r 

E P({fev4)eA}n(nc SJl )). (50) 

Now, consider the last probability, for fixed (ji, ■ ■ ■ ,j r )- The ran- 
dom variables £j a , s = l,...,r are independent and they have densities 
fZj s (x s /a s> j s )/a Sj j g , x s G R + . We also have that the events C s j a , s = 

l,...,r are mutually independent, since their definitions involve Z k s in- 

—t s \ 
dexed by disjoint sets J , s = 1, . . . , r. By conditioning on the £j s 's, we 



obtain that the probability in the right-hand side of (|50p equals 

a s hZ k < x s )dx 1 ■■■dx r 



/ A (n^-/(v^))xiiK v 

S=l ,JS S=l ,^I U . i 

= / II ( f(xs/a s , js ) ]~[ i ? z fc (xs/a s ,fc))dxi ■ ■ ■ dx 



fc6J w \^.} 



In view of flU and ([MD, replacing Ej ie j(i), ..., j re jM TT s =i ■ ■ ■ b Y 
QI=i ( SjeJM ' ' ' )> we obtain that the measure Q H r has a density on 
V(D H ), given by dHJ). D 
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A Regular conditional probability 

We recall here the notion of regular conditional probability. Let Z = 
(Zi,. . . ,Z p ), X = (X\,. . . ,X n ), and let S R p denote the Borel cr-algebra 
on PJj_. The regular conditional probability v of Z given <r(X), is a function 
from B R p x M. n to [0, 1], such that 

(i) i/(x, ■) is a probability measure, for all x € M. n , 

(ii) The function u(-, E) is measurable, for all Borel sets E £ 23rp. 

(iii) P(Z € £, Xefl) = / D z;(x,£)P x (dx), for all £ € S RP and £> € £ R n, 
where P x (-) := P(X G •)• 

See e.g. Proposition A 1.5. Ill in [5] for more details. 

In Section [21 we provided an expression for the regular conditional prob- 
ability in the max- linear model ([3]): 

y(x,E):=P(Ze^|X = x), BeB R P,xeR n + . (51) 

The definition of v implies that 

/ #(z)i/(X,dz) = E(s(Z) | o-(X)), P x -almost surely, 

for all Borel functions g : MP — > M with E|g(Z)| < oo. By the strong law of 
large numbers, the latter conditional expectations are readily approximated 
by TV -1 X^i=i ^(Z )j where 2i^\ i = 1, . . . , N are independent samples from 
the regular conditional probability i/(X, dz). Thus, ^ is the right distribution 
to sample from when performing prediction, given prior observed data. 
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